 OK, good afternoon. So the topic today is SDK-based user space, NEVME over TCP transmission solution. It's basically based on that you know the fabric agreement very well. So this is the agenda today, and this is basically introduced as SDK NEVME out of development, history, and status. And basically, we will talk about SDK TCP transport introduction and then conclusion. And before we start the SDK NEVME of target timeline, before that, we will talk about why we have any fabric agreement. Well, previously, in the machine, we will use SCASHI agreement to compile or drive SSD, at least that's the case for the upper layer. And if we are exposing this fast device to the internet, if we're using the ASDASHI protocol, then PCIe SSD and then we have this MNE protocol, it's more lightweighted and we migrate one MNE disk and allows remote visiting, remote accessing. We can still use other protocol, but there is a lot of overhead. For example, the MNE protocol have to be changed to ASDASHI protocol and transported to target side, ASDASHI protocol will have to be analyzed and changed to ASDASHI and MNE. So this overhead is pretty large and pretty consuming. So we have PCIe SSD hardware and we can change from SCASHI agreement to MNE protocol. And in order to better serve the remote access from a better service for ASDASHI protocol, we have MNE all fabric. So this is the history. MNE fabric protocol is pretty young from 2016 onwards. It's still evolving. And now, no matter it's kernel or SDK kernel and SDK are two different solutions. And it's fully interoperatable. So this is the history of SPDK NEV of target timelines from July 2016 to 2017. From March to July, we have functional hardening and we have done some kernel tests. And from November to 17 to November 18, we are solving the RDMA transport improvements. We are still doing it. And in January this year, we released the TCP transport support. And from April this year onwards, we continue with our improvement and optimization. First, for MNE fabric protocol, standards has been renewed according to the spec and code. And also, in order to be more compatible with kernel, we are improving and doing tests regarding interoperability and our performance. And based on SDK diagnosis, we need to better improve the performance. And this page shows that the SISD SDK MF host timeline or initiator development process. And I won't go into great details here. It's similar to the target side. And this is SDK MNE Fabrics target, server code design. And these are the details. These are the details that we used, trying to prove that based on SDK, we can have the MNE Fabrics target. The performance is guaranteed. First point is important. SDK target side used customer MNE driver. And the advantage is that the application, if they want to visit the MNE SD, then the writing and reading disks will have to go through the system scheduling and we have to go through the documentation system. And for example, the fast device, and then we go to the IOS stack. It's pretty lengthy. And we have to switch the contacts between the clients or the customers. And also, for kernel, we have several log in IOS stack. So there is a heavy competition regarding the resources. So the performance is not very well. And if we're using the MBA to drive the whole process, then we can solve this problem. For example, we can reduce the contact switch or the times needed and IOS stack will be reduced. The number will be reduced greatly. And IO implementation will not be blocked by a competition of resources. And from client side, if we create an IOQPR, each QPR will have to be divided into condition queue and expansion queue and for application, thread can control the self and it can control the IOQPR that's created by itself. And each thread are controlling the IOQPR created by itself and all the submission and completion will not have any problems regarding the resources. They will not compete on resources. And one of the preconditions is that you will use the Linux UIO to drive it. You can draw a memory map or VFIO, that's OK. It's doable and that's the first point. And MNE are now verified by lots of the customers and it's been put in the production environment and been tested in a second. It's not sufficient. We need to provide other solutions. We need to provide a framework for programming. This is called SPDK encapsulated socket API and for MS target we have a group polling idea. So in SDK target time on each CPU court we have a new thread. There's one and only one thread that is running or operating on one specific task. They will have a group polling function and this function is mainly focused on processing all the connections in the group. And no matter what connections that's been referred from the transport if we accept this connection then we will be rescheduled to one specific CPU core for processing. If we have only one CPU core then that means we have only one group. If we initiate this fabric target with multiple CPU core then each CPU core will have its own thread. Then if one connection is taken over by the polling group, then the polling group on each CPU will have to be operating an assigned connection, not other connections that's handled elsewhere. So to a great extent we will minimize the competition for resources and how can we assign the connection on each polling group then that will be determined by algorithm. Well, the current algorithm that we're using is sufficient and the third is deducted from second point and one connection will be mapped to NVME IOQ and we guarantee by doing so the CPU will operate and handle this connection and there will be no competition for resources. And the fourth point and many command handling in target side is processed by a different deployment layout and there is no lock. So to a great extent we can improve its performance, especially for IOQ depth. So this is SDK MF target situation. This is the overall situation and now let's see SDK the transport is supported and we can see that in SDK for transport we have a wrapper and in a menu fabrics protocol we have a definition for transport. For example, we have fabric channel RDMA and GCP and currently the green part is already specifically the side on the protocol and its journey developed by NETCON and protocol and we're still reviewing it. You can still see the progress on that. And the CCP transport in SDK we support two TCP realization. Currently we are based on the kernel products and vex processing VPP or two different categories. Why do we need to keep these two interfaces? The main reason is that SDK, the customer target is operating under customer specific environment and if you're calling on the kernel TCP then you will have to occupy the service then the total target is that if an M&E command transported from initiator going through the networking card then this M&E command it's handling will not involve kernel stack. If we have a TCP transport from the customers then that TCP packet will directly be handled by PND for example the polling monitor driver and will packet the packages and transform and analyze the head of the packet and analyze focused on the M&E command and transport it to the customer stack and then give it back to the customer protocol and the PDK PND data and the whole stack will guarantee that the customers can run through the whole process and the IO yellow copy will also be realized. If the IO is coming in through a network card then we will sign it with a buffer and the buffer will be used in the whole implementation of the IO implementation so this is an ideal status and the first we will support the TCP transport. We have VPP integration and also it's still in development process and VPP integration still have several issues to handle if the VPP if the VPP customer protocol stack is transferred to other platform it won't be that stable so we need to fix a lot of bugs and the standardization is still in progress and it's not an embeddable library it's not, cannot be integrated with SPDK and we have two processes one is the MF target process from SPDK and the other is the VPP process and if we have data coming in from VPP collecting all the packets and process by TCP and going through the memory process and share it to the targets and vice versa so the performance is not very good, not ideal so we have a wrapped TCP interface and the customers can use the SPDK socket API wrapping to integrate with the other customers stack for example Tencent's fStack these are options on the plates this is, there are a lot of room for optimization and operation and here the TCP transport and the service end performance scanning it's pretty much like the previous slides so it's pretty much similar to the programming that we have talked about the TCP transport performance optimization there are still a lot of rooms, still a lot of issues that we can handle and also SPDK and MF we're still using the polling group for all the socket management we used an EPO connection for the first time if we stop the connection we will put it in the EPO group and later we will monitor the socket all of the data in event and in Venex we have the EPO VINX system call and we will see what are the active connections and then these are what we will do and the second is what we've mentioned for TCP connection optimization we're still using the SDK wrapping capacity make the integration and second for all the offloading library that provided by the company for example Matros we have a Matros message passing API so it's also a socket interface and we can directly offload the workload of CCP to its own network card and there's a precondition of doing so first while using it we will use the LED preload to preload the library replacing the TCP socket from G-LIBs and second we will need support of hardware from both sides both parties and third TCPP6 handling from it's pretty simple so the stack has been determined we have to check and all these situations and finally for M&E TCP request lifecycle management we use a situation stack we can still realize it in order to guarantee that the performance and cause balance we will save the memory space that we use or we'll also let the save the optimization resources consumed and here are the details this one is for the target size for the PDO receiving by state there's a follow-up spec I have put the slides on the internet you can check it yourself and we will also post it on the free chat we have seen this we can just then give a lack of the status and you can see this M&E TCP request is a bus cycle from the time it receives many commands and you can undergo five states when it is allocated a new one when it is allocated a new stack it's not specific you can go into M&E driver and after execution if you want to allocate some buffer you will check whether it's green or red if it's a green command after the buffer has been allocated it can just be given to the NMVB buffer but if it's red if it's in Capsule PDU then after we receive the data from the remote device we can just write it into the red command otherwise we have to trigger the RTD when this receive the initial data they will give to the driver and to execute and we will make this go to the initial stage so this is the whole process the red dotted line means error pass and the it is similar to the STD case STD case so this is helpful for you to read some codes we have mentioned the performance so we have a test the performance it consists of three machines one is target we have 16 MVE disks they can put into one subsystem and then export it for the initial site the initiator site we can have the one to 16 connections and for two it can have as much as 32 connections for this page we can see the IO scaling performance we divide the target side into four parts and the initiator site they can send 16 connections so in total there are 32 connections for the left one the QD is one and for the left one the QD is 32 so the line latency you can see increased IOPS also from the third to fourth one there is a decrease in the performance the latency decrease in the latency so the CPU expansion the connection dealing with will also be reduced yes the performance will also be will all be increased as a result in the initiator site we can use the FIO plus plug-in if we can increase the CPU core you can see the increase in the performance so the trend is similar to the target site this is similar to the kernel and we can see the latency it's about the test of the network performance so for the backend storage we use a now broke device actually it can also be conducted in the kernel and for the test we have this standard is that on system site we can see this kernel target and we assign the kernel initiator for the target site for the left side suppose we all use the kernel initiator and for the target and we use the kernel's target we use the spdk's target you can see that blue one uses spdk's target you can see that latency there's a very obvious decrease like about 20-30% so we use spdk's target on the initiator site we use the kernel initiator or spdk's initiator we can see the relative latency and there's a decrease about 30-50% so we merge these two graphs we can see the two solutions one is the pure kernel solution target another is spdk's solution target we can see the latency there's a very obvious decrease like the sharp decrease about 30% and there's an increase in this improvement in the latency so this is using the kernel and spdk's pdk when we use the different numbers for connections to evaluate the performance the workload we use is about 4kb random write 70% is reads, 30% is writes we can see that in the same hardware environment with the same CPU and control spdk's target performance is the over 2.5 times that of the kernel so this shows that under the same CPU configuration the spdk has fewer CPU resources but can complete the same task so for HCEI infrastructure this is very good for the improvement of the environment because you can save the CPU space for more running of the virtualization machine and for the vcpo so this is yes this saves the cost a lot but on the condition of the improvement of the performance for the next page there is a process so apart from using the third party features but I also want to use the hardware feature like the Intel the next generation 100 GB you can support ADAQ so on the condition of high IOP it can also still improve TR latency for the third to second line is about details about information how to isolate the hard VAQ also the Q needs the support this is some technical requirements like the kernel driver we need to support a stocked API it's an acquisition this also exists in some new kernel and to support this application we need to use the Epo mechanism to deal with Q with the same MVP so to tell what the Q is has the same NOVPI with other Q we need to use the PS socket MVP and to just input this option so for the hardware we also need some support to support the filtering to the shipment of the traffic to control the shipment and traffic and for the NIC's work we are still tuning as Intel's new release of the NIC NIC I think these features will be integrated into the software so that when they are using NIC we can enjoy the benefits and we have some further development plans as the software is open source and it's not very perfect so we need to cooperate with Linux kernel to have more interoperability with NICs to have this test we need to continue to do this performance tuning to integrate with third party yes to have deep integration and also we need to use the hardware features and to provide better offloading API and to integrate with SPG and Smiley so this is the conclusion we have seen that SBDK MF solutions is history so we are focusing on introducing TCP transport, its realization and some details and we have given some performance tests to prove that SBDK's performance it is good so if they use SBDK solutions then we don't need to upgrade kernel and we can realize and enjoy this benefit but if you want to use the latest kernel EETCP then the stability version need to be upgraded to 5.1 so in the last welcome to take part in the SBDK community so there are different ways to participate like to propose some questions and bug and if you are very energetic and if you want to work on SBDK you can offer some patch yes that's the end of my presentation and yes so last thing on the WeChat we have the community to release some SBDK technical information yes we will release a new SBDK and TCP transport introduction and have more details in it so welcome to welcome all of you to read these articles on the WeChat and give us some feedback thank you I have three questions so you compare SBDK and TCP and kernel but for you the TCP it was the first conducted kernel yes we have mentioned it why we use SBDK is better than used in the kernel because TCP it for the analysis of the package so I think the SBDK why it has advantages is because SBDK programming framework so the framework is about synchronization, the parallel mode so under this programming condition if you deal with the incoming MF in SBDK you have this dedicated CPU to deal with it you won't be switched between different CPUs used so there's not a long also SBDK it mainly uses the NMU driver but for the users they also still need to use the driver specifically for the kernel but for the so the lock is too long but for SBDK SBDK's users driver it's unlock and it's unsynchronized so this performance determines that even they use the kernel TCP it still performs better than kernel so if you want to test the local and really disk under the same physical configuration SBDK's kernel performance is 10 times than that of the kernel so I think this is a very good point to prove that the SBDK is better than that we have tested SBDK's plans we think that SBDK should be released but our test is worse than yours so we have this 140k so this is to do with environment we have tested it's more to do with the pressure test you can go to the IO web of SBDK the design 19.0 the IMA based report our environment is the same as that one so you can just refer to that environment yes the parameters are same you can just refer to that report so I have another question I can ask you in private ok thank you everyone